In this report I’ll give a short overview of the work done during the
data science project in
Programming Languages for Data Science. I worked on the
whole Data Science Lifecycle from the idea to the final
insights and model.
The topic is about online auctions using an online
game called Guild Wars 2 as an example. The data was
fetched from the Web API provided by the game developers. We talk about
around 5.000.000 datasets so I had to concentrate on the
most valuable items first.
The goal was to find out if there are items that can be bought low and sold high to earn (virtual) money. This was pretty much possible with data analysis and visualization. Of cause those items are changing very fast, but as a conclusion I can say it’s not possible to buy and sell “the one item”, the gains are pretty low and it’s only worth it when buying items in bulk.
As the data pretty much contains all data and doesn’t yield a real question, for the model part, I wanted to find out and predict at which rate an item can be sold. The sober consideration is that is only depends on the buying price which selling price can be achieved. All other attributes like type, rarity or required level of the item did not have any effect on the outcome.
Malcolm Forbes once said, “Money isn’t everything as long as you have enough of it”. Sure, money doesn’t buy happiness, but it can buy a lot of things. This is not only true for the real world, but especially in online games, where micro transactions and payed services have become the new normal.
More and more games give you the choice to invest a lot of time or a lot of money. Both ways you’ll somehow achieve the goal of the game faster. In some games you can even exchange digital goods for real money.
To get to the point, having digital currency in games can save you time, stress and even real money.
There are multiple ways to earn money in games. One way may be farming, which means hunting special items in high amounts and selling them to other players. Here’s the question, which items are worth collecting and can be sold for which prices?
Another way is buying items from other players and reselling them with a higher price tag, just like in the real world. Many games have auction houses where those transactions can be made. For this type of income one needs to know what to buy when, which prices are low or high, and when to sell for which amount.
For this project my example of game will be Guild Wars 2. Guild Wars 2 was published in 2012 by NCSoft. There are no monthly fees and the basic game is free to play since 2015. It is a massively multiplayer online role-playing game, meaning a lot of people playing in parallel in an online world. There are over 20.000 items that can be collected and most of them can be sold and bought at the auction house, called the trading post.
The trading post
When selling items, 5% of the cost is a fee for the auction house and gets immediately taken from the wallet. Each time an item is placed in the auction house, this 5% fee must be payed, so it’s best if the items get sold on the first run.
When buying items, 10% of the price goes to the bank before the rest is delivered to the seller.
Because of that it’s important to find items where the buy-to-sell price ratio is the highest. In addition, as we buy and sell, we have to pay the whole 15% taxes which lowers our revenue.
Here is one example:
This is how it works:
Overall, we invested 1.05 gold and earned
1.8 gold. This is a revenue of (only)
0.75 gold.
Coins are the basic currency
Just like in the real world, where 100 cents are 1 Euro, the main currency in Guild Wars, coins, are separated into three units. There is copper, silver and gold.
The following values are all the same and are automatically calculated by the game: ` 10.000 copper == 100 silver == 1 gold.
For this project, data is generally displayed in gold.
We need to get the data first. We scrap the data from
https://api.guildwars2.com/v2/commerce/. First we get all
26800 items and, in batches of 200 items per
request, get all auctions for these items. It’s really important to use
the batch endpoint to keep the runtime to a reasonable duration.
dir <- getwd()
date <- params$data_date
price_list_buys <- read.csv(paste(dir, "/../data/raw/gw2-all-buys-raw-", date, ".csv", sep = ""))
price_list_sells <- read.csv(paste(dir, "/../data/raw/gw2-all-sells-raw-", date, ".csv", sep = ""))
item_list <- read.csv(paste(dir, "/../data/raw/gw2-all-items-raw-", date, ".csv", sep = ""))
df_buys <- item_list %>% left_join(price_list_buys, by = "id")
df_sells <- item_list %>% left_join(price_list_sells, by = "id")
rm(dir)
rm(date)
rm(item_list)
rm(price_list_sells)
rm(price_list_buys)
First we clean the data and change some feature types. As already said, most of the time we use the price in gold, so we add this feature.
df_sells <- df_sells %>%
drop_na(unit_price, quantity) %>%
mutate(rarity = as.factor(rarity),
type = as.factor(type),
item_type = as.factor(item_type),
item_weight_class = as.factor(item_weight_class),
unit_price_gold = unit_price / 10000) %>%
select(-unit_price)
df_buys <- df_buys %>%
drop_na(unit_price, quantity) %>%
mutate(rarity = as.factor(rarity),
type = as.factor(type),
item_type = as.factor(item_type),
item_weight_class = as.factor(item_weight_class),
unit_price_gold = unit_price / 10000) %>%
select(-unit_price)
After acquiring the data this way we have one file with all the items, one with all the sell orders (3855399) and one with all the buy orders (around 434399). When joining those files together, the amount of data would be enormous.
Therefore some filtering is required first. We only take the highest buy orders and the lowest sell orders, because these are the first ones to be bought/sold.
df_max_buys <- df_buys %>%
group_by(name) %>%
slice(which.max(unit_price_gold))
df_min_sells <- df_sells %>%
group_by(name) %>%
slice(which.min(unit_price_gold))
Now we can join the sell and buy orders and calculate the profit per item according to the tax rules described earlier.
df_all <- df_max_buys %>%
mutate(quantity_buys = quantity,
unit_price_gold_buys = unit_price_gold) %>%
select(-quantity, -unit_price_gold) %>%
right_join(df_min_sells %>%
mutate(quantity_sells = quantity,
unit_price_gold_sells = unit_price_gold) %>%
select(id, quantity_sells, unit_price_gold_sells), by = "id") %>%
mutate(name = name.x) %>%
select(-name.x, -name.y)
df_all <- df_all %>%
mutate(unit_price_gold_diff = unit_price_gold_sells - unit_price_gold_buys,
profit = 0.85 * unit_price_gold_sells - unit_price_gold_buys,
more_sells = quantity_sells - quantity_buys)
As we can see there are some very high outliers and a lot of outliers between around 100 and 3000 gold. We need to zoom in much more to see the details.
df_all %>%
ggplot() +
geom_boxplot(aes(x = 'Sells', y = unit_price_gold_sells)) +
geom_boxplot(aes(x = 'Buys', y = unit_price_gold_buys)) +
geom_boxplot(aes(x = 'Profit', y = profit)) +
geom_hline(yintercept = 3000, linetype="dashed", color = "red") +
geom_hline(yintercept = 100, linetype="dashed", color = "blue") +
scale_y_continuous(labels = comma) +
labs(title = "Outliers on buys and sells", subtitle = "Baseline between 100 and 3000 gold",
x = "", y = "Price in gold", caption = paste("Data from", params$data_date))
## Warning: Removed 3756 rows containing non-finite values (stat_boxplot).
## Removed 3756 rows containing non-finite values (stat_boxplot).
df_all %>%
subset(profit < 2.5 & unit_price_gold_sells < 2.5) %>%
ggplot() +
geom_boxplot(aes(x = 'Sells', y = unit_price_gold_sells)) +
geom_text(aes(x = 'Sells', y = median(unit_price_gold_sells), label = median(unit_price_gold_sells)), size = 3, vjust = -1) +
geom_boxplot(aes(x = 'Profit', y = profit)) +
geom_text(aes(x = 'Profit', y = median(profit), label = median(profit)), size = 3, vjust = -0.5) +
scale_y_continuous(labels = comma) +
labs(title = "Outliers on profit and sells", subtitle = "Limit at 2.5 gold profit and sell price",
x = "", y = "Price in gold", caption = paste("Data from", params$data_date))
median_profit_silver <- round(100 * df_all %>%
drop_na(profit) %>%
summarise(median(profit)) %>%
first(), 2)
median_profit_silver_filter <- round(100 * df_all %>%
subset(profit < 2.5 & unit_price_gold_sells < 2.5) %>%
drop_na(profit) %>%
summarise(median(profit)) %>%
first(), 2)
The median profit is not in the gold range, but at 14.37 silver. When filtering with reasonable bounds, the profit even falls down to 4.11 silver.
We can now strip down the data even further.
Let’s take only the items with a realistic profit and strip away the items where it would be better to sell them to the non-player vendor instead of placing them in the auction house.
df_all <- df_all %>%
subset(profit > 0.04 & profit < 0.4)
df_all <- df_all %>%
subset(profit * 100 > vendor_value) %>%
arrange(desc(profit))
Now only 293 items left. Let’s have a closer look at those items.
TODO: Summary row with mean doesn’t make much sense. Maybe change prices to silver instead of gold.
df_all %>%
group_by(type, rarity) %>%
summarise(name = unique(name),
profit = profit * 100,
icon = min(web_image(icon, height = 50))
) %>%
drop_na() %>%
arrange(desc(profit)) %>%
gt(rowname_col = "name") %>%
tab_header(title = "Realistic top profit items", subtitle = "") %>%
fmt_number(
columns = profit,
suffixing = "S"
) %>%
fmt_markdown(
columns = icon
) %>%
summary_rows(
groups = TRUE,
columns = profit,
fns = list(average = "mean"),
formatter = fmt_number
) %>%
tab_footnote(
footnote = "Prices in silver",
locations = cells_column_labels(columns = profit)
) %>%
tab_source_note(
"Based on data from api.guildwars2.com"
) %>%
tab_options(
summary_row.background.color = "#ACEACE",
row_group.background.color = "#FFEFDB",
table.layout = "auto",
container.overflow.x = TRUE,
container.height = px(350)
)
| Realistic top profit items | ||
|---|---|---|
| profit1 | icon | |
| CraftingMaterial - Basic | ||
| Oiled Orichalcum Sword Hilt | 39.98 | |
| Oiled Orichalcum Horn | 39.65 | |
| Warbeast Orichalcum Pauldron Casing | 38.33 | |
| Oiled Small Ancient Haft | 38.11 | |
| Warbeast Hardened Helmet Strap | 35.07 | |
| Rugged Longcoat Panel | 34.98 | |
| Warbeast Gossamer Pant Lining | 34.69 | |
| Oiled Orichalcum Shield Backing | 34.56 | |
| Warbeast Orichalcum Gauntlet Plates | 31.93 | |
| Oiled Orichalcum Boot Casing | 31.05 | |
| Warbeast Hardened Shoulderguard Padding | 29.61 | |
| Coarse Longcoat Panel | 28.67 | |
| Hardened Boot Upper | 24.58 | |
| Steel Splint Chestplate Panel | 20.91 | |
| Wool Footwear Upper | 20.34 | |
| Wool Vestments Panel | 20.02 | |
| Rugged Trouser Panel | 18.87 | |
| Elonian Boot Sole | 18.30 | |
| Iron Spear Head | 18.20 | |
| Oiled Orichalcum Sword Blade | 17.49 | |
| Elonian Trouser Padding | 17.44 | |
| Darksteel Legging Panel | 17.18 | |
| Orichalcum Chestplate Panel | 16.73 | |
| Rugged Glove Panel | 15.56 | |
| Darksteel Pistol Barrel | 14.39 | |
| Oiled Ancient Rifle Stock | 13.75 | |
| Green Harpoon | 13.52 | |
| Gossamer Coat Panel | 13.39 | |
| Thick Longcoat Panel | 13.33 | |
| Iron Scale Legging Panel | 13.20 | |
| Rawhide Chestguard Panel | 13.09 | |
| Iron Mace Head | 12.94 | |
| Iron Scale Armguard Panel | 12.68 | |
| Ancient Longbow Stave | 12.51 | |
| Iron Axe Blade | 12.46 | |
| Bronze Mace Head | 12.01 | |
| Orichalcum Dagger Blade | 11.58 | |
| Gossamer Epaulet Panel | 11.46 | |
| Coarse Trouser Panel | 11.34 | |
| Steel Sword Blade | 11.10 | |
| Rawhide Legging Panel | 10.73 | |
| Soft Harpoon | 10.57 | |
| Orichalcum Chain | 10.57 | |
| Steel Pistol Barrel | 10.40 | |
| Hardened Shoulderguard Panel | 10.39 | |
| Gossamer Gloves Panel | 10.08 | |
| Iron Shield Backing | 10.04 | |
| Gossamer Helm Strap | 10.02 | |
| Simple Tailor's Tools | 10.02 | |
| Steel Rifle Barrel | 9.91 | |
| Steel Hammer Head | 9.90 | |
| Ancient Short-Bow Stave | 9.88 | |
| Linen Coat Panel | 9.88 | |
| Hard Scepter Rod | 9.80 | |
| Hardened Boot Sole | 9.78 | |
| Orichalcum Sword Hilt | 9.66 | |
| Oiled Hardened String | 9.50 | |
| Orichalcum Spear Head | 9.42 | |
| Thin Legging Panel | 9.32 | |
| Linen Helm Strap | 9.18 | |
| Steel Splint Pauldron Casing | 8.76 | |
| Iron Scale Legging Lining | 8.67 | |
| Iron Scale Chest Panel | 8.61 | |
| Iron Pauldron Casing | 8.49 | |
| Hardened Helmet Strap | 8.18 | |
| Orichalcum Legging Panel | 8.11 | |
| Bolt of Embroidered Silk | 8.04 | |
| Steel Splint Boot Casing | 7.96 | |
| Seasoned Staff Shaft | 7.51 | |
| Steel Splint Gauntlet Plates | 7.49 | |
| Steel Mace Head | 7.48 | |
| Darksteel Horn | 7.42 | |
| Rugged Boot Upper | 7.34 | |
| Green Staff Shaft | 7.30 | |
| Steel Shield Boss | 7.27 | |
| Hardened Glove Panel | 7.11 | |
| Bronze Greatsword Hilt | 7.05 | |
| Darksteel Pauldron Casing | 6.73 | |
| Steel Shield Backing | 6.63 | |
| Hard Harpoon | 6.30 | |
| Simple Huntsman's Tools | 6.28 | |
| Iron Scale Boot Panel | 6.27 | |
| Rugged Shoulderguard Panel | 6.24 | |
| Steel Dagger Blade | 6.21 | |
| Jute Breeches Panel | 6.13 | |
| Bronze Dagger Blade | 6.11 | |
| Steel Spear Head | 5.93 | |
| Wool Headpiece Strap | 5.80 | |
| Iron Sword Hilt | 5.75 | |
| Bronze Torch Head | 5.62 | |
| Bronze Chain Chest Panel | 5.54 | |
| Tub of Wood Glue | 5.49 | |
| Iron Casque Lining | 5.45 | |
| Green Short-Bow Stave | 5.45 | |
| Iron Hammer Head | 5.43 | |
| Bowl of Cream Soup Base | 5.41 | |
| Iron Rifle Barrel | 5.26 | |
| Steel Torch Head | 5.25 | |
| Small Hard Haft | 5.22 | |
| Seasoned Rifle Stock | 5.16 | |
| Iron Scale Chest Padding | 5.11 | |
| Steel Trident Head | 5.04 | |
| Iron Horn | 4.96 | |
| Green Scepter Rod | 4.88 | |
| Wool Epaulet Padding | 4.85 | |
| Pile of Simple Chili Seasoning | 4.72 | |
| Thin Glove Lining | 4.45 | |
| Bronze Spear Head | 4.08 | |
| average | 12.49 | — |
| CraftingMaterial - Exotic | ||
| Mordant Inscription | 39.95 | |
| Inscription of the Spearmarshal | 36.90 | |
| Smell-Enhancing Culture | 28.17 | |
| Insignia of the Spearmarshal | 26.13 | |
| Insignia of the Harrier | 15.34 | |
| average | 29.30 | — |
| Consumable - Fine | ||
| Recipe: Potent Master Tuning Crystals | 39.92 | |
| Recipe: Bowl of Refugee's Beet Soup | 39.92 | |
| Writ of Studied Strength | 39.57 | |
| Bowl of Cactus Fruit Salad | 38.24 | |
| Recipe: Toxic Focusing Crystal | 38.19 | |
| Thesis on Studied Accuracy | 35.28 | |
| Prickly Pear Stuffed Nopal | 33.65 | |
| Thesis on Calculated Accuracy | 30.75 | |
| Writ of Accuracy | 28.09 | |
| Writ of Basic Speed | 27.57 | |
| Writ of Calculated Malice | 26.84 | |
| Thesis on Basic Accuracy | 25.17 | |
| Recipe: Bowl of Zesty Turnip Soup | 19.46 | |
| Cheese Pizza | 18.58 | |
| Writ of Basic Strength | 13.16 | |
| Writ of Basic Accuracy | 12.77 | |
| Minor Potion of Destroyer Slaying | 11.64 | |
| Thesis on Basic Malice | 11.53 | |
| Bowl of Candy Corn Ice Cream | 10.44 | |
| Eggs Beetletun | 7.16 | |
| Canopy Dye | 6.57 | |
| Writ of Basic Malice | 5.88 | |
| Potion of Flame Legion Slaying | 5.68 | |
| Thesis on Basic Strength | 5.60 | |
| Potion of Undead Slaying | 4.61 | |
| average | 21.45 | — |
| Consumable - Rare | ||
| Grave Dye | 39.86 | |
| White Tiger Staff Skin | 39.57 | |
| Bloodstone Dark Coral | 39.24 | |
| Rogue Grymm Svaard | 38.15 | |
| Arcane Dye | 38.15 | |
| Improvised Dagger Skin | 35.02 | |
| Blue Orchid Dye | 34.98 | |
| Glacial Focus Skin | 34.31 | |
| Bloodstone Dark Indigo | 33.35 | |
| Mesa Dye | 31.56 | |
| Daybreak Dye | 31.23 | |
| White Tiger Axe Skin | 29.63 | |
| Toxin Dye | 29.00 | |
| Recipe: 20-Slot Equipment Pact Box | 28.93 | |
| Ruin Dye | 28.32 | |
| Bloodstone Indigo | 27.61 | |
| Gargoyle Shield Skin | 27.47 | |
| Draconic Rifle Skin | 26.90 | |
| Shadow Sword Skin | 26.44 | |
| Amenity Dye | 24.85 | |
| Perseverance Dye | 24.74 | |
| White Tiger Hammer Skin | 24.67 | |
| Gargoyle Longbow Skin | 24.62 | |
| Equinox Focus Skin | 24.12 | |
| Crushed Bone Dye | 23.71 | |
| White Tiger Shield Skin | 22.67 | |
| Red Crane Short Bow Skin | 22.53 | |
| Auric Dye | 21.52 | |
| Mystic Forge Node | 21.14 | |
| Sunfire Lava Dye | 19.99 | |
| Carnage Orange Dye | 18.41 | |
| Steampunk Tybalt | 18.24 | |
| Blue Steel Dye | 16.96 | |
| Draconic Longbow Skin | 16.68 | |
| Blacklight Dye | 16.28 | |
| Shiver Sky Dye | 15.26 | |
| Tar Dye | 15.13 | |
| Benevolence Dye | 15.04 | |
| Gallant Warhorn Skin | 14.96 | |
| Wind Catcher Skin | 14.49 | |
| Phalanx Turai Ossa | 13.80 | |
| Gallant Focus Skin | 13.39 | |
| Bloodstone Violet | 12.03 | |
| Priory Grymm Svaard | 11.22 | |
| Swampblack Dye | 11.11 | |
| Frozen Scales Dye | 11.03 | |
| Dragon's Jade Warhammer Skin | 9.54 | |
| Shadow Rifle Skin | 8.69 | |
| Quartz Dye | 8.58 | |
| Abyss Stalker Dagger Skin | 6.36 | |
| War God's Focus | 5.88 | |
| Draconic Dagger Skin | 5.69 | |
| Limonite Dye | 5.65 | |
| average | 21.86 | — |
| Consumable - Masterwork | ||
| Great Guild Firework | 39.52 | |
| Feast of Sage-Stuffed Poultry | 39.41 | |
| Tray of Chocolate Cherries | 37.47 | |
| Feast of Bean Salad | 37.09 | |
| Bowl of Passion Fruit Tapioca Pudding | 36.23 | |
| Feast of Garlic Spinach Sautee | 36.22 | |
| Tray of Garlic Bread | 34.13 | |
| Feast of Pepper Steak Dinners | 32.45 | |
| Winter Warband Festive Mortar | 31.66 | |
| Tray of Chocolate Bananas | 27.54 | |
| Pineapple Dye | 24.63 | |
| Mystery Quaggan Tonic | 21.66 | |
| Tray of Chocolate Oranges | 17.44 | |
| Watermelon Dye | 12.90 | |
| Mummy Tonic | 11.90 | |
| Violet Tint Dye | 10.06 | |
| Plate of Eggs Benedict | 7.99 | |
| Lunar New Year Firework | 5.49 | |
| Parrot Mail Carrier | 5.01 | |
| Raven Mail Carrier | 4.39 | |
| average | 23.66 | — |
| Weapon - Fine | ||
| Half Eaten Scepter | 37.86 | |
| Half Eaten Longbow | 29.98 | |
| Half Eaten Hammer | 23.31 | |
| Asuran Harpoon | 20.72 | |
| Copper Mace | 20.10 | |
| Half Eaten Rifle | 20.08 | |
| Half Eaten Torch | 16.02 | |
| Half-Eaten Short Bow | 12.18 | |
| Half Eaten Shield | 8.20 | |
| average | 20.94 | — |
| Container - Basic | ||
| Small Bag of Skritt Shinies | 36.34 | |
| Deciphered Clues | 30.77 | |
| Bag of Laboratory Materials | 26.94 | |
| Light Icy Bag | 15.94 | |
| average | 27.50 | — |
| CraftingMaterial - Rare | ||
| Honed Intricate Cotton Insignia | 35.85 | |
| Strong Steel Imbued Inscription | 35.03 | |
| Vigorous Iron Imbued Inscription | 31.61 | |
| Tengu Echo Blade | 28.57 | |
| average | 32.77 | — |
| CraftingMaterial - Masterwork | ||
| Giver's Embroidered Silk Insignia | 35.26 | |
| Practical Leatherworker's Tools | 26.87 | |
| Bag of Radiant Energy | 24.10 | |
| Bag of Incandescent Energy | 18.64 | |
| Bag of Dolyak Chow | 16.18 | |
| Steel Reinforcing Plate | 10.27 | |
| Bag of Shimmering Energy | 10.17 | |
| Copper Reinforcing Plate | 7.47 | |
| average | 18.62 | — |
| Trophy - Rare | ||
| Giant Mushroom Spore | 33.63 | |
| Coastal Lumber Core | 18.05 | |
| Lamp Finial | 10.22 | |
| Dollop of Choya Harissa | 8.30 | |
| Visage of Dwayna | 7.43 | |
| Ash Legion Key | 5.93 | |
| Jungle Grass Seed | 5.15 | |
| Palm Lumber Core | 4.31 | |
| average | 11.63 | — |
| Bag - Fine | ||
| 15 Slot Invisible Pack | 32.66 | |
| 15 Slot Invisible Bag | 28.98 | |
| 12 Slot Craftsman's Bag | 28.10 | |
| 15 Slot Oiled Pack | 27.27 | |
| average | 29.25 | — |
| Container - Fine | ||
| Box of Simple Mighty Chain Armor | 30.32 | |
| average | 30.32 | — |
| Trophy - Masterwork | ||
| Amber Quantic Dipole | 28.95 | |
| average | 28.95 | — |
| CraftingMaterial - Fine | ||
| Sturdy Tailor's Tools | 27.08 | |
| Sturdy Weaponsmith's Tools | 8.46 | |
| Sturdy Leatherworker's Tools | 8.30 | |
| average | 14.61 | — |
| UpgradeComponent - Masterwork | ||
| Minor Rune of the Adventurer | 26.26 | |
| Minor Rune of the Brawler | 23.28 | |
| Minor Rune of the Scholar | 21.81 | |
| Minor Rune of the Mesmer | 18.59 | |
| Minor Rune of Perplexity | 12.24 | |
| Minor Rune of the Warrior | 10.00 | |
| Minor Rune of the Guardian | 9.20 | |
| Minor Rune of the Thief | 7.67 | |
| Minor Rune of the Ogre | 6.83 | |
| Minor Rune of Altruism | 6.12 | |
| Minor Rune of the Elementalist | 5.08 | |
| average | 13.37 | — |
| MiniPet - Exotic | ||
| Mini GL-XC S7S | 26.21 | |
| Mini Charles the Hellfire Skeleton | 10.18 | |
| average | 18.19 | — |
| Gizmo - Masterwork | ||
| Masterwork Black Lion Dye Canister—Yellow | 24.72 | |
| Masterwork Black Lion Dye Canister—Red | 16.82 | |
| Endless Common Clothing Tonic | 4.20 | |
| average | 15.25 | — |
| Consumable - Basic | ||
| Offering Basket | 23.39 | |
| Small Dusty Wintersday Gift | 15.18 | |
| average | 19.28 | — |
| Trophy - Basic | ||
| Skritt Artifact | 16.81 | |
| Hylek Armor | 7.03 | |
| average | 11.92 | — |
| Gizmo - Basic | ||
| Capacitive Bottle | 16.71 | |
| Guild Siege Golem Blueprints | 14.84 | |
| Guild Shield Generator Blueprint | 13.05 | |
| Guild Flame Ram Blueprint | 7.30 | |
| Guild Trebuchet Blueprints | 6.31 | |
| average | 11.64 | — |
| Container - Exotic | ||
| Unopened Endless Branded Tonic | 14.50 | |
| average | 14.50 | — |
| Weapon - Exotic | ||
| Staff of the Lost of Severance | 14.25 | |
| Might of the Vindictive of Severance | 14.09 | |
| Forged Bow | 13.21 | |
| Gate of Good-Byes of Severance | 10.52 | |
| average | 13.02 | — |
| Trophy - Exotic | ||
| Pile of Jacarandere | 13.58 | |
| average | 13.58 | — |
| Trophy - Ascended | ||
| Heat Stone | 12.94 | |
| average | 12.94 | — |
| Container - Masterwork | ||
| Unopened Crystal Shard Kite | 12.77 | |
| average | 12.77 | — |
| Gizmo - Exotic | ||
| Infinite Molten Berserker Tonic | 8.77 | |
| Endless Halloween Tonic | 7.37 | |
| average | 8.07 | — |
| Consumable - Exotic | ||
| Visage of the Great Ram Firework | 7.64 | |
| Visage of the Great Monkey Firework | 6.62 | |
| Recipes of the Dwarven Leather Trader | 4.43 | |
| Twisted Watchwork Portal Device | 4.32 | |
| average | 5.75 | — |
| Armor - Fine | ||
| Darkvine Cloth Gloves | 7.26 | |
| average | 7.26 | — |
| Based on data from api.guildwars2.com | ||
| 1 Prices in silver | ||
To make the data split reproducible, we set a seed. We want to predict the gold price based on attributes of the items.
set.seed(42)
# Put 3/4 of the data into the training set
data_split <- initial_split(df_all,
prop = 3/4,
strata = profit,
breaks = 4)
# Create dataframes for the two sets:
train_data <- training(data_split)
test_data <- testing(data_split)
Create validation set (used during modeling)
cv_folds <-
vfold_cv(train_data,
v=5,
strata = profit,
breaks = 4)
df_train <- train_data
Now work with the train data.
df_train %>%
select(where(is.numeric), -more_sells, -profit, -unit_price_gold_diff) %>% # only select numerical data
vis_cor(cor_method = "spearman", na_action = "pairwise.complete.obs")
We see that our data is pretty uncorrelated which makes it hard to find a good classification model. What we see is that buy and sell price are somewhat correlated as well as the level of the item to the vendor value.
The latter makes sense, as there must be some kind of algorithm that sets the sell price, probably also based on the item’s level.
Following idea: A model that predicts the sell price based on the buy price. Then we search for outliers where the sell price was predicted too high. Probably those items are underrated somehow?
Let’s have a look at the distribution of profit in general for these items.
df_train_distribution <- df_train %>%
group_by(name) %>%
summarise(mean_profit = mean(profit),
type = unique(type),
rarity = unique(rarity)) %>%
arrange(desc(mean_profit))
df_train_distribution %>%
ggplot() +
geom_histogram(aes(x = mean_profit, fill = type), stat="count") +
scale_x_binned(limits = c(0, 0.4)) +
labs(x = "Mean profit", y = "Count",
title = "Item profit distribution", subtitle = "Items by profit, in gold",
caption = paste("Data from", params$data_date))
## Warning: Ignoring unknown parameters: binwidth, bins, pad
To get a better overview how the item buy costs and profits relate to each other, we’ll do a cluster analysis.
(taken from https://www.kirenz.com/post/2020-05-21-r-hierarchische-clusteranalyse/)
df_cl <- df_train %>%
select(c("id", "profit", "unit_price_gold_buys"))
df_cl$profit <- scale(df_cl$profit, center = TRUE, scale = TRUE)
df_cl$unit_price_gold_buys <- scale(df_cl$unit_price_gold_buys, center = TRUE, scale = TRUE)
d <-
df_cl %>%
select(-id) %>%
dist(method = "euclidean")
hc <- hclust(d, method = "ward.D2")
plot(hc)
The dendrogram displays the number of clusters. The higher the number,
the less similar are the clusters to each other. Because there is a big
gap between ~ 5 and ~ 10 which would result in only two clusters, four
clusters seem fine.
We can also try six clusters, as two of the clusters seem to be pretty small.
hc$labels <- df_cl$id
grp <- cutree(hc, k = 4)
df_cl$cluster <- grp
df_cl %>%
ggplot(aes(unit_price_gold_buys,
profit,
color = factor(cluster))) +
geom_point() +
# geom_text(aes(label = id), size = 3, check_overlap = FALSE, vjust = 0, nudge_y = 0.1) +
xlab("Buy Costs") +
ylab("Profit") +
theme(legend.title=element_blank())
rm(d)
rm(hc)
rm(grp)
Because we scaled the numbers we can’t really say something about the real value of the items, but most of them are in the lower cost / lower profit range.
Four clusters were created out of the data:
The profit is not a question of buy costs, there are items with high profit for low and for high costs.
There are some items that are cheap but will bring good profit.
Starting with a low budget, we can’t buy a lot of high-priced items. Therefore it’s good to know which items get the most profit compared to their costs. We always suppose the worst profit.
df_train_high_roi <- df_train %>%
group_by(name) %>%
summarise(
id = unique(id),
profitByCost = min(profit) / max(unit_price_gold_buys),
profit = min(profit),
cost = max(unit_price_gold_buys) * 100,
sell = min(unit_price_gold_sells) * 100,
quantity = min(quantity_buys)
) %>%
ungroup() %>%
arrange(desc(profitByCost))
df_train_high_roi %>%
ggplot(aes(x = cost, y = profit)) +
geom_point() +
geom_smooth(method='lm', formula= y~x) +
labs(x = "Cost", y = "Profit",
title = "Cost and Profit of Items", subtitle = "Prices in gold",
caption = paste("Data from", params$data_date))
This is basically the same picture we saw earlier on cluster analysis, but with the real profit values.
df_train_high_roi %>%
mutate(
profitByCost = round(profitByCost, digits = 2),
profit = round(profit, digits = 2),
cost = round(cost, digits = 2),
sell = round(sell, digits = 2)
) %>%
filter(profit > 0) %>%
select(name, profitByCost, profit, cost, sell, quantity) %>%
datatable(extensions = c('ColReorder', 'Buttons', 'Responsive', 'Scroller', 'SearchPanes', 'Select'),
options = list(colReorder = TRUE,
dom = 'Bfrtip',
buttons = list('searchPanes', 'copy', list(
extend = 'collection',
buttons = c('csv', 'excel', 'pdf'),
text = 'Download'))),
escape = FALSE,
colnames=c("Name", "Profit by cost", "Profit", "Cost", "Sell", "Quantity")
)
Great, we found items with very high profit. Let’s buy some of them that also have a high quantity.